Named Entity Recognition in Assamese

نویسندگان

  • Padmaja Sharma
  • Utpal Sharma
  • Jugal Kalita
  • Nancy Chinchor
  • Eric Brown
  • Lisa Ferro
  • John Lafferty
  • Andrew McCallum
  • Scott Miller
  • Michael Crystal
  • Heidi Fox
  • Lance Ramshaw
  • Richard Schwartz
چکیده

Named Entity Recognition is a process through which a program extracts proper nouns in texts and associates them with a proper tag. NER has made significant progress in European languages, but in Indian languages due to the lack of effort as well as proper resources, it remains a challenging task. Recognizing ambiguities and assigning the correct tags to the names is the main goal of NER. Thus NER can be defined as identification of the proper nouns and the classification of these nouns into classes such as person, location, organization and miscellaneous including date, time and year. The main aim of this work is to develop a computational system that can perform NER in text in Assamese, which is a resource poor Indo-Aryan language. This article present an overview of NER and its issues in the context of Assamese, and also the work done in Assamese using various approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey of Named Entity Recognition in Assamese and other Indian Languages

Named Entity Recognition is always important when dealing with major Natural Language Processing tasks such as information extraction, question-answering, machine translation, document summarization etc so in this paper we put forward a survey of Named Entities in Indian Languages with particular reference to Assamese. There are various rule-based and machine learning approaches available for N...

متن کامل

The first Steps towards Assamese Named Entity Recognition

Named Entity Recognition (NER) is the process of identifying and classifying proper nouns in text documents into pre-defined classes such as person, location and organization. Although NER in Indian languages is a di cult and challenging task and su↵ers from scarcity of resources, such work has started to appear recently. In this paper we present an overview of NER, the di↵erent approaches to N...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016